39 research outputs found
Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder
We investigate the integration of a planning mechanism into an
encoder-decoder architecture with an explicit alignment for character-level
machine translation. We develop a model that plans ahead when it computes
alignments between the source and target sequences, constructing a matrix of
proposed future alignments and a commitment vector that governs whether to
follow or recompute the plan. This mechanism is inspired by the strategic
attentive reader and writer (STRAW) model. Our proposed model is end-to-end
trainable with fully differentiable operations. We show that it outperforms a
strong baseline on three character-level decoder neural machine translation on
WMT'15 corpus. Our analysis demonstrates that our model can compute
qualitatively intuitive alignments and achieves superior performance with fewer
parameters.Comment: Accepted to Rep4NLP 2017 Workshop at ACL 2017 Conferenc
Adversarial Generation of Natural Language
Generative Adversarial Networks (GANs) have gathered a lot of attention from
the computer vision community, yielding impressive results for image
generation. Advances in the adversarial generation of natural language from
noise however are not commensurate with the progress made in generating images,
and still lag far behind likelihood based methods. In this paper, we take a
step towards generating natural language with a GAN objective alone. We
introduce a simple baseline that addresses the discrete output space problem
without relying on gradient estimators and show that it is able to achieve
state-of-the-art results on a Chinese poem generation dataset. We present
quantitative results on generating sentences from context-free and
probabilistic context-free grammars, and qualitative language modeling results.
A conditional version is also described that can generate sequences conditioned
on sentence characteristics.Comment: 11 pages, 3 figures, 5 table
Prédiction et génération de données structurées à l'aide de réseaux de neurones et de décisions discrètes
L’apprentissage profond, une sous-discipline de l’apprentissage automatique, est de plus en
plus utilisé dans une multitude de domaines, dont le traitement du langage naturel. Toutefois,
plusieurs problèmes restent ouverts, notamment la prédiction de longues séquences et la
génération de langues naturelles. Dans le mémoire qui suit, nous présentons deux modèles
travaillant sur ces problèmes.
Dans le chapitre 1, nous incorporons un système de planification à l’intérieur des modèles
séquence-à -séquence. Pour ce faire, le modèle détermine à l’avance l’alignement entre la
sĂ©quence d’entrĂ©e et de sortie. Nous montrons que ce mĂ©canisme amĂ©liore l’alignement Ă
l’intérieur des modèles, converge plus rapidement et nécessite moins de paramètres. Nous
montrons également des gains de performance en traduction automatique, en génération de
questions ainsi que la découverte de circuits eulériens dans des graphes.
Dans le chapitre 2, nous appliquons des réseaux antagonistes génératifs aux langues
naturelles, une tâche compliquée par la nature discrète du domaine. Le modèle est entraîné de
manière purement non supervisée et n’utilise aucune estimation de gradients. Nous montrons
des résultats en modélisation de la langue, en génération de grammaires non contextuelles
et génération conditionnelle de phrases.Deep learning, a subdiscipline of machine learning, is used throughout multiple domains,
including natural language processing. However, in the field multiple problems remain open,
notably the prediction of long sequences and the generation of natural languages. In the
following thesis, we present two models that work toward solving both of these problems.
In chapter 1, we add a planning mechanism to sequence-to-sequence models. The mech-
anism consists of establishing ahead of time the alignment between the input and output
sequence. We show that this improves the alignment, help the model to converge faster, and
necessitate fewer parameters. We also show performance gain in neural machine translation,
questions generation, and the algorithmic task of finding Eulerian circuits in graphs.
In chapter 2, we tackle the language generation task using generative adversarial net-
works. A non-trivial problem considering the discrete nature of the output space. The
model is trained using only an adversarial loss and without any gradient estimation. We
show results on language modeling, context-free grammar generation, and conditional sen-
tence generation